69 research outputs found
Graph Neural Networks are Inherently Good Generalizers: Insights by Bridging GNNs and MLPs
Graph neural networks (GNNs), as the de-facto model class for representation
learning on graphs, are built upon the multi-layer perceptrons (MLP)
architecture with additional message passing layers to allow features to flow
across nodes. While conventional wisdom commonly attributes the success of GNNs
to their advanced expressivity, we conjecture that this is not the main cause
of GNNs' superiority in node-level prediction tasks. This paper pinpoints the
major source of GNNs' performance gain to their intrinsic generalization
capability, by introducing an intermediate model class dubbed as
P(ropagational)MLP, which is identical to standard MLP in training, but then
adopts GNN's architecture in testing. Intriguingly, we observe that PMLPs
consistently perform on par with (or even exceed) their GNN counterparts, while
being much more efficient in training. This finding sheds new insights into
understanding the learning behavior of GNNs, and can be used as an analytic
tool for dissecting various GNN-related research problems. As an initial step
to analyze the inherent generalizability of GNNs, we show the essential
difference between MLP and PMLP at infinite-width limit lies in the NTK feature
map in the post-training stage. Moreover, by examining their extrapolation
behavior, we find that though many GNNs and their PMLP counterparts cannot
extrapolate non-linear functions for extremely out-of-distribution samples,
they have greater potential to generalize to testing samples near the training
data range as natural advantages of GNN architectures.Comment: Accepted to ICLR 2023. Codes in https://github.com/chr26195/PML
Lyman-{\alpha} polarization from cosmological ionization fronts: I. Radiative transfer simulations
In this paper, we present the formalism of simulating Lyman- emission
and polarization around reionization ( = 8) from a plane-parallel ionization
front. We accomplish this by using a Monte Carlo method to simulate the
production of a Lyman- photon, its propagation through an ionization
front, and the eventual escape of this photon. This paper focuses on the
relation of the input parameters of ionization front speed , blackbody
temperature , and neutral hydrogen density , on
intensity and polarized intensity as seen by a distant observer. The
resulting values of intensity range from
erg/cm/s/sr to erg/cm/s/sr , and the
polarized intensity ranges from erg/cm/s/sr to
erg/cm/s/sr. We found that higher ,
higher , and higher contribute to higher intensity, as well as
polarized intensity, though the strongest dependence was on the hydrogen
density. The dependence of viewing angle of the front is also explored. We
present tests to support the validity model, which makes the model suitable for
further use in a following paper where we will calculate the intensity and
polarized intensity power spectrum on a full reionization simulation.Comment: 29 pages, 13 figures, to be submitted to JCA
Advective Diffusion Transformers for Topological Generalization in Graph Learning
Graph diffusion equations are intimately related to graph neural networks
(GNNs) and have recently attracted attention as a principled framework for
analyzing GNN dynamics, formalizing their expressive power, and justifying
architectural choices. One key open questions in graph learning is the
generalization capabilities of GNNs. A major limitation of current approaches
hinges on the assumption that the graph topologies in the training and test
sets come from the same distribution. In this paper, we make steps towards
understanding the generalization of GNNs by exploring how graph diffusion
equations extrapolate and generalize in the presence of varying graph
topologies. We first show deficiencies in the generalization capability of
existing models built upon local diffusion on graphs, stemming from the
exponential sensitivity to topology variation. Our subsequent analysis reveals
the promise of non-local diffusion, which advocates for feature propagation
over fully-connected latent graphs, under the assumption of a specific
data-generating condition. In addition to these findings, we propose a novel
graph encoder backbone, Advective Diffusion Transformer (ADiT), inspired by
advective graph diffusion equations that have a closed-form solution backed up
with theoretical guarantees of desired generalization under topological
distribution shifts. The new model, functioning as a versatile graph
Transformer, demonstrates superior performance across a wide range of graph
learning tasks.Comment: 39 page
DIFFormer: Scalable (Graph) Transformers Induced by Energy Constrained Diffusion
Real-world data generation often involves complex inter-dependencies among
instances, violating the IID-data hypothesis of standard learning paradigms and
posing a challenge for uncovering the geometric structures for learning desired
instance representations. To this end, we introduce an energy constrained
diffusion model which encodes a batch of instances from a dataset into
evolutionary states that progressively incorporate other instances' information
by their interactions. The diffusion process is constrained by descent criteria
w.r.t.~a principled energy function that characterizes the global consistency
of instance representations over latent structures. We provide rigorous theory
that implies closed-form optimal estimates for the pairwise diffusion strength
among arbitrary instance pairs, which gives rise to a new class of neural
encoders, dubbed as DIFFormer (diffusion-based Transformers), with two
instantiations: a simple version with linear complexity for prohibitive
instance numbers, and an advanced version for learning complex structures.
Experiments highlight the wide applicability of our model as a general-purpose
encoder backbone with superior performance in various tasks, such as node
classification on large graphs, semi-supervised image/text classification, and
spatial-temporal dynamics prediction.Comment: Accepted by International Conference on Learning Representations
(ICLR 2023
CD8(+) T Cells Involved in Metabolic Inflammation in Visceral Adipose Tissue and Liver of Transgenic Pigs
Anti-inflammatory therapies have the potential to become an effective treatment for obesity-related diseases. However, the huge gap of immune system between human and rodent leads to limitations of drug discovery. This work aims at constructing a transgenic pig model with higher risk of metabolic diseases and outlining the immune responses at the early stage of metaflammation by transcriptomic strategy. We used CRISPR/Cas9 techniques to targeted knock-in three humanized disease risk genes, GIPR(dn) , hIAPP and PNPLA3(I148M) . Transgenic effect increased the risk of metabolic disorders. Triple-transgenic pigs with short-term diet intervention showed early symptoms of type 2 diabetes, including glucose intolerance, pancreatic lipid infiltration, islet hypertrophy, hepatic lobular inflammation and adipose tissue inflammation. Molecular pathways related to CD8(+) T cell function were significantly activated in the liver and visceral adipose samples from triple-transgenic pigs, including antigen processing and presentation, T-cell receptor signaling, co-stimulation, cytotoxicity, and cytokine and chemokine secretion. The similar pro-inflammatory signaling in liver and visceral adipose tissue indicated that there might be a potential immune crosstalk between the two tissues. Moreover, genes that functionally related to liver antioxidant activity, mitochondrial function and extracellular matrix showed distinct expression between the two groups, indicating metabolic stress in transgenic pigs' liver samples. We confirmed that triple-transgenic pigs had high coincidence with human metabolic diseases, especially in the scope of inflammatory signaling at early stage metaflammation. Taken together, this study provides a valuable large animal model for the clinical study of metaflammation and metabolic diseases.Peer reviewe
A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure
In this paper, a small-area low-power 64-bit integer multiplier is presented, which is suitable for portable devices or wireless applications. To save the area cost and power consumption, an input vector systolic (IVS) structure is proposed based on four 16-bit radix-8 Booth multipliers and a data input scheme is proposed to reduce the number of signal transitions. This structure is similar to a systolic array in matrix multiply units of a Convolutional Neural Network (CNN), but it reduces the number of processing elements by 3/4 concerning the same vector systolic accelerator in reference. The comparison results prove that the IVS multiplier reduces at least 61.9% of the area and 45.18% of the power over its counterparts. To increase the hardware resource utilization, a Transverse Carry Array (TCA) structure for Partial Products Accumulation (PPA) was designed by replacing the 32-bit adders with 3/17-bit adders in the 16-bit multipliers. The experiment results show that the optimization could lead to at least a 6.32% and 13.65% reduction in power consumption and area cost, respectively, compared to the standard 16-bit radix-8 Booth multiplier. In the end, the precise scale of the proposed IVS multiplier is discussed. Benefiting from the modular design, the IVS multiplier can be configured to support sixteen different kinds of multiplications at a step of 16 bits [16b, 32b, 48b, 64b] × [16b, 32b, 48b, 64b]
- …